The performance of parallel matrix algorithms on a broadcast-based architecture

نویسندگان

Constantine Katsinis

Diana Hecht

Ming Zhu

Harsha Narravula

چکیده

Due to advances in fiber-optics and VLSI technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper summarizes one such multiprocessor architecture called the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus). It also presents enhancements to the network interface and the cache and directory controllers which support cache block combining, capture and prefetch and allow complete overlap of processing time with the communication time due to compulsory misses. The paper uses two fundamental matrix algorithms to characterize the impact of each enhancement on performance. Cache miss analysis and results from the execution of these programs on a SOME-Bus simulator show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate due to compulsory misses unaffected.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

3-RPS Parallel Manipulator Dynamical Modelling and Control Based on SMC and FL Methods

In this paper, a dynamical model-based SMC (Sliding Mode Control) is proposed fortrajectory tracking of a 3-RPS (Revolute, Prismatic, Spherical) parallel manipulator. With ignoring smallinertial effects of all legs and joints compared with those of the end-effector of 3-RPS, the dynamical model ofthe manipulator is developed based on Lagrange method. By removing the unknown Lagrange multipliers...

متن کامل

The Impact of Network Architecture in Cluster Parallel Algorithms Design: Matrix Multiplication on Infiniband

Ethernet has been a standard technology used for cluster interconnection, which is based on a shared bus. This technology impacts in some way the kind of messages used for parallel algorithms optimization on clusters: point to point messages are only used when necessary since collectives communications (broadcasts, more specifically) are more efficient. The emergence of Infiniband as network te...

متن کامل

طراحی و آموزش شبکه‏ های عصبی مصنوعی به وسیله استراتژی تکاملی با جمعیت‏ های موازی

Application of artificial neural networks (ANN) in areas such as classification of images and audio signals shows the ability of this artificial intelligence technique for solving practical problems. Construction and training of ANNs is usually a time-consuming and hard process. A suitable neural model must be able to learn the training data and also have the generalization ability. In this pap...

متن کامل

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

Two Strategies Based on Meta-Heuristic Algorithms for Parallel Row Ordering Problem (PROP)

Proper arrangement of facility layout is a key issue in management that influences efficiency and the profitability of the manufacturing systems. Parallel Row Ordering Problem (PROP) is a special case of facility layout problem and consists of looking for the best location of n facilities while similar facilities (facilities which has some characteristics in common) should be arranged in a row ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Concurrency and Computation: Practice and Experience

دوره 18 شماره

صفحات -

تاریخ انتشار 2006

The performance of parallel matrix algorithms on a broadcast-based architecture

نویسندگان

چکیده

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

3-RPS Parallel Manipulator Dynamical Modelling and Control Based on SMC and FL Methods

The Impact of Network Architecture in Cluster Parallel Algorithms Design: Matrix Multiplication on Infiniband

طراحی و آموزش شبکه‏ های عصبی مصنوعی به وسیله استراتژی تکاملی با جمعیت‏ های موازی

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Two Strategies Based on Meta-Heuristic Algorithms for Parallel Row Ordering Problem (PROP)

عنوان ژورنال:

اشتراک گذاری